22 research outputs found

    An investigation into the impact of controlled English rules on the comprehensibility, usefulness and acceptability of machine-translated technical documentation for French and German users

    Get PDF
    Previous studies suggest that the application of Controlled Language (CL) rules can significantly improve the readability, consistency, and machine-translatability of source text. One of the justifications for the application of CL rules is that they can have a similar impact on several target languages by reducing the post-editing effort required to bring Machine Translation (Ml’) output to acceptable quality. In certain situations, however, post-editing services may not always be a viable solution. Web-based information is often expected to be made available in real-time to ensure that its access is not restricted to certain users based on their locale. Uncertainties remain with regard to the actual usefulness of MT output for such users, as no empirical study has examined the impact of CL rules on the usefulness, comprehensibility, and acceptability of MT technical documents from a Web user's perspective. In this study, a two-phase approach is used to determine whether Controlled English rules can have a significant impact on these three variables. First, individual CL rules are evaluated within an experimental environment, which is loosely based on a test suite.Two documents are then published and subject to a randomised evaluation within the framework of an online experiment using a customer satisfaction questionnaire. The findings indicate that a limited number of CL rules have a similar impact on the comprehensibility of French and German output at the segment level. The results of the online experiment show that the application of certain CL rules has the potential to significantly improve the comprehensibility of German MT technical documentation. Our findings also show that the introduction of CL rules did not lead to any significant improvement of the comprehensibility, usefulness, and acceptability of French MT technical documentation

    TMX markup: a challenge when adapting SMT to the localisation environment

    Get PDF
    Translation memory (TM) plays an important role in localisation workflows and is used as an efficient and fundamental tool to carry out translation. In recent years, statistical machine translation (SMT) techniques have been rapidly developed, and the translation quality and speed have been significantly improved as well. However,when applying SMT technique to facilitate post-editing in the localisation industry, we need to adapt SMT to the TM data which is formatted with special mark-up. In this paper, we explore some issues when adapting SMT to Symantec formatted TM data. Three different methods are proposed to handle the Translation Memory eXchange (TMX) markup and a comparative study is carried out between them. Furthermore, we also compare the TMX-based SMT systems with a customised SYSTRAN system through human evaluation and automatic evaluation metrics. The experimental results conducted on the French and English language pair show that the SMT can perform well using TMX as input format either during training or at runtime

    A detailed analysis of phrase-based and syntax-based machine translation: the search for systematic differences

    Get PDF
    This paper describes a range of automatic and manual comparisons of phrase-based and syntax-based statistical machine translation methods applied to English-German and English-French translation of user-generated content. The syntax-based methods underperform the phrase-based models and the relaxation of syntactic constraints to broaden translation rule coverage means that these models do not necessarily generate output which is more grammatical than the output produced by the phrase-based models. Although the systems generate different output and can potentially be fruitfully combined, the lack of systematic difference between these models makes the combination task more challenging

    Improving the post-editing experience using translation recommendation: a user study

    Get PDF
    We report findings from a user study with professional post-editors using a translation recommendation framework (He et al., 2010) to integrate Statistical Machine Translation (SMT) output with Translation Memory (TM) systems. The framework recommends SMT outputs to a TM user when it predicts that SMT outputs are more suitable for post-editing than the hits provided by the TM. We analyze the effectiveness of the model as well as the reaction of potential users. Based on the performance statistics and the users’comments, we find that translation recommendation can reduce the workload of professional post-editors and improve the acceptance of MT in the localization industry

    Community-based post-editing of machine-translated content: monolingual vs. bilingual

    Get PDF
    We carried out a machine-translation postediting pilot study with users of an IT support forum community. For both language pairs (English to German, English to French), 4 native speakers for each language were recruited. They performed monolingual and bilingual postediting tasks on machine-translated forum content. The post-edited content was evaluated using human evaluation (fluency, comprehensibility, fidelity). We found that monolingual post-editing can lead to improved fluency and comprehensibility scores similar to those achieved through bilingual post-editing, while we found that fidelity improved considerably more for the bilingual set-up. Furthermore, the performance across post-editors varied greatly and it was found that some post-editors are able to produce better quality in a monolingual set-up than others

    Qualitative analysis of post-editing for high quality machine translation

    Get PDF
    In the context of massive adoption of Machine Translation (MT) by human localization services in Post-Editing (PE) workflows, we analyze the activity of post-editing high quality translations through a novel PE analysis methodology. We define and introduce a new unit for evaluating post-editing effort based on Post-Editing Action (PEA) - for which we provide human evaluation guidelines and propose a process to automatically evaluate these PEAs. We applied this methodology on data sets from two technologically different MT systems. In that context, we could show that more than 35% of the remaining effort can be saved by introducing of global PEA and edit propagation

    Foreebank: Syntactic Analysis of Customer Support Forums

    Get PDF
    International audienceWe present a new treebank of English and French technical forum content which has been annotated for grammatical errors and phrase structure. This double annotation allows us to empirically measure the effect of errors on parsing performance. While it is slightly easier to parse the corrected versions of the forum sentences, the errors are not the main factor in making this kind of text hard to parse

    DCU-Symantec submission for the WMT 2012 quality estimation task

    Get PDF
    This paper describes the features and the machine learning methods used by Dublin City University (DCU) and SYMANTEC for the WMT 2012 quality estimation task. Two sets of features are proposed: one constrained, i.e. respecting the data limitation suggested by the workshop organisers, and one unconstrained, i.e. using data or tools trained on data that was not provided by the workshop organisers. In total, more than 300 features were extracted and used to train classifiers in order to predict the translation quality of unseen data. In this paper, we focus on a subset of our feature set that we consider to be relatively novel: features based on a topic model built using the Latent Dirichlet Allocation approach, and features based on source and target language syntax extracted using part-of-speech (POS) taggers and parsers. We evaluate nine feature combinations using four classification-based and four regression-based machine learning techniques

    DCU-Symantec at the WMT 2013 Quality Estimation Shared Task

    Get PDF
    We describe the two systems submitted by the DCU-Symantec team to Task 1.1. of the WMT 2013 Shared Task on Quality Estimation for Machine Translation. Task 1.1 involve estimating post-editing effort for English-Spanish translation pairs in the news domain. The two systems use a wide variety of features, of which the most effective are the word-alignment, n-gram frequency, language model, POS-tag-based and pseudo-references ones. Both systems perform at a similarly high level in the two tasks of scoring and ranking translations, although there is some evidence that the systems are over-fitting to the training data

    An investigation into the impact of controlled English rules on the comprehensibility, usefulness and acceptability of machine-translated technical documentation for French and German users

    No full text
    Previous studies suggest that the application of Controlled Language (CL) rules can significantly improve the readability, consistency, and machine-translatability of source text. One of the justifications for the application of CL rules is that they can have a similar impact on several target languages by reducing the post-editing effort required to bring Machine Translation (Ml’) output to acceptable quality. In certain situations, however, post-editing services may not always be a viable solution. Web-based information is often expected to be made available in real-time to ensure that its access is not restricted to certain users based on their locale. Uncertainties remain with regard to the actual usefulness of MT output for such users, as no empirical study has examined the impact of CL rules on the usefulness, comprehensibility, and acceptability of MT technical documents from a Web user's perspective. In this study, a two-phase approach is used to determine whether Controlled English rules can have a significant impact on these three variables. First, individual CL rules are evaluated within an experimental environment, which is loosely based on a test suite.Two documents are then published and subject to a randomised evaluation within the framework of an online experiment using a customer satisfaction questionnaire. The findings indicate that a limited number of CL rules have a similar impact on the comprehensibility of French and German output at the segment level. The results of the online experiment show that the application of certain CL rules has the potential to significantly improve the comprehensibility of German MT technical documentation. Our findings also show that the introduction of CL rules did not lead to any significant improvement of the comprehensibility, usefulness, and acceptability of French MT technical documentation
    corecore